# Multi-dataset Training
Icedit Normal Lora
Other
This is an image-to-image conversion model based on LoRA technology, primarily used for non-commercial image editing tasks.
Image Generation English
I
RiverZ
1,046
7
Vitpose Plus Large
Apache-2.0
ViTPose++ is a vision Transformer-based foundation model for human pose estimation, achieving an outstanding performance of 81.1 AP on the MS COCO keypoint test set.
Pose Estimation
Transformers

V
usyd-community
1,731
1
Kazrush Ru Kk
Apache-2.0
kazRush-ru-kk is a Russian-to-Kazakh translation model based on the T5 configuration, trained on multiple open-source parallel datasets.
Machine Translation
Transformers Other

K
deepvk
332
8
Rad Dino Maira 2
Other
RAD-DINO-MAIRA-2 is a vision transformer model trained with DINOv2 self-supervised learning, specifically designed for encoding chest X-ray images.

R
microsoft
9,414
11
Wav2vec2 Large Robust 6 Ft Age Gender
This model, fine-tuned from Wav2Vec2-Large-Robust, can predict the speaker's age and gender from raw audio.
Audio Classification
Transformers

W
audeering
19.29k
2
Gpt2 Bangla Summurizer
This is a Bengali text summarization model based on the GPT2 architecture, specifically optimized for news content.
Text Generation
Transformers Other

G
faridulreza
18
0
Whisper Base Japanese
Apache-2.0
This model is fine-tuned on the Common Voice, JVS, and JSUT datasets for Japanese speech recognition tasks using openai/whisper-base.
Speech Recognition
Transformers Japanese

W
Ivydata
137
3
Stt De Fastconformer Hybrid Large Pc
This is a German automatic speech recognition model based on the FastConformer architecture, employing a hybrid training approach with Transformer and CTC, with a parameter size of approximately 115M.
Speech Recognition German
S
nvidia
1,017
4
T5 Small Korean Summarization
A Korean text summarization model based on the T5 architecture, specifically optimized for Korean text to generate concise and accurate summaries.
Text Generation
Transformers Korean

T
eenzeenee
123
3
BENT PubMedBERT NER Gene
Apache-2.0
This is a named entity recognition model fine-tuned on PubMedBERT, specifically designed to identify gene and protein entities in biomedical texts.
Sequence Labeling
Transformers English

B
pruas
87
13
All MiniLM L6 V2 128dim
Apache-2.0
This is a sentence embedding model based on the MiniLM architecture, capable of mapping text to a 384-dimensional vector space, suitable for tasks such as semantic search and sentence similarity calculation.
Text Embedding English
A
freedomfrier
1,377
0
Whisper Small Cantonese
Apache-2.0
A Cantonese speech recognition model fine-tuned based on OpenAI Whisper-small, achieving a CER of 7.93 on the Common Voice 16.0 test set
Speech Recognition
Transformers Supports Multiple Languages

W
alvanlii
2,413
85
Stt Es Conformer Transducer Large
This is a large Conformer-Transducer model for Spanish automatic speech recognition, with approximately 120 million parameters, trained on 1340 hours of Spanish speech data.
Speech Recognition Spanish
S
nvidia
708
4
Stt Es Conformer Ctc Large
This is a large Conformer-CTC model for Spanish automatic speech recognition (ASR), trained and released by NVIDIA.
Speech Recognition Spanish
S
nvidia
59
2
Stt Fr Conformer Transducer Large
This is a large-scale Conformer-Transducer model for French automatic speech recognition, with approximately 120 million parameters, trained on over 1,500 hours of French speech data.
Speech Recognition French
S
nvidia
31
10
Stt De Conformer Ctc Large
This is a large-scale Conformer-CTC model for German automatic speech recognition, trained and optimized by NVIDIA on thousands of hours of German speech data.
Speech Recognition German
S
nvidia
132
4
Stt En Citrinet 1024 Gamma 0 25
NVIDIA Streaming Citrinet 1024 is a non-autoregressive model for English automatic speech recognition, based on CTC loss/decoding, with approximately 140 million parameters.
Speech Recognition English
S
nvidia
156
3
Densenet121 Res224 Chex
Apache-2.0
A pre-trained model based on the DenseNet121 architecture, specifically designed for chest X-ray image classification tasks with 18 output targets.
Image Classification
Transformers

D
torchxrayvision
25
1
All MiniLM L6 V2
Apache-2.0
This is a sentence embedding model based on sentence-transformers, capable of mapping text to a 384-dimensional vector space, suitable for semantic search and clustering tasks.
Text Embedding English
A
obrizum
1,647
5
Wav2vec2 Large Xlsr Galician
Optimized automatic speech recognition model for Galician, fine-tuned based on wav2vec2-large-xlsr-53, with a WER of 7.12
Speech Recognition
Transformers

W
ifrz
9,330
1
Bp500 Base10k Voxpopuli
Apache-2.0
This is a Wav2vec 2.0 speech recognition model optimized for Brazilian Portuguese, fine-tuned on multiple Brazilian Portuguese datasets
Speech Recognition
Transformers Other

B
lgris
23
0
Wav2vec2 Large Xlsr 53 Japanese
Apache-2.0
Japanese speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampling rate audio input
Speech Recognition Japanese
W
jonatasgrosman
2.9M
33
Wav2vec2 Xls R 1b German
Apache-2.0
This is a German automatic speech recognition model based on the XLS-R 1B architecture, fine-tuned on multiple German speech datasets including Common Voice 8.0
Speech Recognition
Transformers German

W
jonatasgrosman
105
3
Mt5 Small Sum De En V1
A bilingual summarization model based on multilingual T5, supporting English and German text summarization tasks
Text Generation
Transformers Supports Multiple Languages

M
deutsche-telekom
1,210
8
Bp400 Xlsr
Apache-2.0
A Wav2vec 2.0 speech recognition model fine-tuned on Brazilian Portuguese datasets, supporting automatic speech recognition tasks for Brazilian Portuguese.
Speech Recognition
Transformers Other

B
lgris
55
3
Wav2vec2 Base Turkish
Apache-2.0
This model is a Wav2Vec2 speech recognition model fine-tuned on the Common Voice Turkish dataset, demonstrating excellent performance in Turkish automatic speech recognition tasks.
Speech Recognition
Transformers Other

W
cahya
49
4
Sbert Roberta Large Anli Mnli Snli
A sentence transformation model based on RoBERTa-large, specifically designed for sentence similarity tasks, trained on ANLI, MNLI, and SNLI datasets
Text Embedding
Transformers English

S
usc-isi
38
2
W2v Hf Jsut Xlsr53
Apache-2.0
A Japanese automatic speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53 using the Common Voice and JSUT datasets.
Speech Recognition
Transformers Japanese

W
qqpann
16
1
Wav2vec2 Large Xlsr 53 Chinese Zh Cn
Apache-2.0
A Chinese speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampling rate audio input.
Speech Recognition Chinese
W
jonatasgrosman
3.8M
110
Wav2vec2 Large Xlsr Open Brazilian Portuguese
Apache-2.0
This is a Wav2vec 2.0 model fine-tuned for Brazilian Portuguese, trained using multiple open Brazilian Portuguese datasets including Common Voice, MLS, CETUC, etc.
Speech Recognition
Transformers Other

W
lgris
395
9
Wav2vec2 Live Japanese
Apache-2.0
A Japanese speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, supporting hiragana output
Speech Recognition
Transformers Japanese

W
ttop324
20
4
Wav2vec2 Xls R 1b Spanish
Apache-2.0
This is a Spanish automatic speech recognition model fine-tuned based on the XLS-R 1 billion parameter model, trained and optimized on multiple Spanish datasets.
Speech Recognition
Transformers Spanish

W
jonatasgrosman
2,270
6
Camembert Squadfr Fquad Piaf Answer Extraction
MIT
This model is fine-tuned from CamemBERT-base, specifically designed for answer extraction tasks in French texts, trained on SquadFR, FQuAD, and PIAF datasets.
Question Answering System
Transformers French

C
lincoln
16
0
Wangchanberta Finetuned Sentiment
Apache-2.0
A model specialized in Thai text sentiment analysis, supporting positive, neutral, and negative sentiment classification.
Text Classification
Transformers Other

W
poom-sci
615
12
Distilbert Fa Zwnj Base Ner
A DistilBERT model fine-tuned for Persian Named Entity Recognition (NER) tasks, supporting recognition of 10 entity types.
Sequence Labeling
Transformers Other

D
HooshvareLab
101
4
Minilm L6 Mnli Fever Docnli Ling 2c
A binary natural language inference model trained on 8 NLI datasets, excelling in long-text reasoning tasks
Text Classification
Transformers English

M
MoritzLaurer
22
2
Wav2vec2 Large Japanese
Japanese speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supports 16kHz sampling rate input
Speech Recognition Japanese
W
NTQAI
316
7
Featured Recommended AI Models